Better Guarantees for Sparsest Cut Clustering
نویسنده
چکیده
The field of approximation algorithms for clustering is a very active one and a large number of algorithms have been developed for clustering objectives such as k-median, min-sum, and sparsest cut clustering. For most of these objectives, the approximation guarantees do not match the known hardness results, and much effort is spent on obtaining tighter approximation guarantees [1, 4, 5, 8, 6, 9, 10]. However, for many practical clustering problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct “target” clustering; in such cases the pairwise information is merely based on some heuristics and the real goal is to achieve low error on the data. In these settings, the implicit hope is that approximately optimizing objective functions such as those mentioned above will in fact produce a clustering of low error, i.e., a clustering that is close pointwise to the truth. Formally, for a set of n data points the error of a clustering the error of a clustering C′ = {C ′ 1, ..., C ′ k} with respect to target clustering C = {C1, ..., Ck} is the fraction of points on which C and C′ disagree under the optimal matching of clusters in C to clusters in C′, i.e.
منابع مشابه
Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics
Dasgupta recently introduced a cost function for the hierarchical clustering of a set of points given pairwise similarities between them. He showed that this function is NP -hard to optimize, but a top-down recursive partitioning heuristic based on an αn-approximation algorithm for uniform sparsest cut gives an approximation of O(αn logn) (the current best algorithm has αn = O( √ log n)). We sh...
متن کاملUnifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering
We present and analyze a new framework for graph clustering based on a specially weighted version of correlation clustering, that unifies several existing objectives and satisfies a number of attractive theoretical properties. Our framework, which we call LambdaCC, relies on a single resolution parameter λ, which implicitly controls both the edge density and sparsest cut of all output clusters....
متن کاملn)-Approximation Algorithm For Directed Sparsest Cut
We give an O( √ n)-approximation algorithm for the Sparsest Cut Problem on directed graphs. A näıve reduction from Sparsest Cut to Minimum Multicut would only give an approximation ratio of O( √ n log D), where D is the sum of the demands. We obtain the improvement using a novel LP-rounding method for fractional Sparsest Cut, the dual of Maximum Concurrent Flow.
متن کاملHierarchical Clustering via Spreading Metrics
We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2016] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [Dasgupta, 2016] that a top-down algorithm returns a hierarchical clustering of cost at most O (αn log n) times the cost of the optimal hierarchical clustering, where ...
متن کاملEmbedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$
Goemans showed that any n points x1, . . . xn in d-dimensions satisfying l 2 2 triangle inequalities can be embedded into l1, with worst-case distortion at most √ d. We extend this to the case when the points are approximately low-dimensional, albeit with average distortion guarantees. More precisely, we give an l2-to-l1 embedding with average distortion at most the stable rank, sr (M), of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009